Quality Data, Quality Decisions: Why Web Scraping is Essential for Advanced Analytics
Gediminas Rickevičius·9 min


Technology companies currently avoid deliberately not having Personally identifiable information (PII) about you. They do this since having PII about you put them under strict regulatory supervision which eventually increases the cost of their business model. GDPR (General Data Protection Regulation) is an example of one such regulation introduced in the European Union recently for all the companies utilizing user data. The regulators are catching up as the computer systems of companies are becoming smarter with advanced computing abilities in terms of processing speed and analyzing & inferring content from user data.
Let's take another example of LinkedIn - where you suddenly change your behavior from being a passive user visiting the platform once in a while to reply to a message or respond to some contact requests, to one where you start reaching out to potential headhunters for meeting requests or submitting your Resume. This change in your activity on the platform will flag LinkedIn that your behavior is changing & that you may be looking for a new Job. LinkedIn is able to infer this contextual data simply by analyzing the change in your activity patterns even if you never mentioned you were looking for a job in any of your messages. Other social media platforms like Facebook & portals like Google are employing similar techniques to generate Metadata which is then used to bring you customized products & services to you.
Technology companies have been very good at monetizing your Metadata, therefore, we need to have a regulatory framework to monitor the use of such data. This also brings to the forefront the notion of differential privacy which means just giving the minimum possible relevant information to the company or person without revealing the complete details. In the context of Finance, let's say you want to open an account with a trading brokerage, but they need to know whether you are an American citizen or not since there are stringent regulatory & compliance requirements if you are one. It's good enough for the brokerage to know that you are not an American citizen - you could be any other nationality. Differential Privacy allows for this where you have given the necessary information but not the complete details.
Looking at another example - Let's say you are signing up for a service online on a website which requires you to confirm that you are not under 18, but they don't care whether you are 25 or 38. Giving your exact age would hint towards PII which can be traced back to you. Differential privacy lets you get away with this notion of not sharing your complete information. And as long as the data rules are strongly in place for the company in question, they will be able to leverage solely on this information. Differential privacy is being led by companies who are not interested in monetizing your data. Apple, for example, whose revenue is driven more than 80% by the hardware sales of its Computers & Wearable devices has no interest in making money off your data since it makes economic sense.
On the other side of the spectrum, you have tech. giants like Amazon, Facebook or Google which are much more interested in gathering as much & complete information about the users as possible since that's what their business models are based on - will have a much bigger challenge in enforcing differential privacy. A split, therefore, is imminent with companies not monetizing your data supporting differential privacy, while companies making money off your data will be looking at data sovereignty as a governance model for their customer.
Stay in touch: Twitter | StockTwits | LinkedIn | Telegram| Tradealike
Faisal is based in Canada with a background in Finance/Economics & Computers. He has been actively trading FOREX for the past 11 years. Faisal is also an active Stocks trader with a passion for everything Crypto. His enthusiasm & interest in learning new technologies has turned him into an avid Crypto/Blockchain & Fintech enthusiast. Currently working for a Mobile platform called Tradelike as the Senior Technical Analyst. His interest for writing has stayed with him all his life ever since started the first Internet magazine of Pakistan in 1998. He blogs regularly on Financial markets, trading strategies & Cryptocurrencies. Loves to travel.